*Generates the crosswalk between IQVIA Provider ID and NPI
*Version 15 Stata

set more off 

*bring in raw IQVIA provider file
*import delimited AMA_PROVIDER.txt, varnames(1) clear
*save princeton_AMA_PROVIDER.dta, replace

*bring in data
use princeton_AMA_PROVIDER.dta

*rename provider name variables
rename provider_first first_name
rename provider_last last_name

*generate an running counter for each first name, last name, state, and city combination;
*	keep only the first of each combination
bysort first_name last_name provider_state provider_city: gen i_name_state_city=_n
keep if i_name_state_city==1 // 99.5% obs 

*generate practice state variable
gen prac_state=provider_state

*now, try to merge on first/last name and practice address state using NPPES file
merge m:m first_name last_name prac_state using npidata_cleaned_main.dta
keep if _merge!=2
count if _merge==1 // 409,729 members in AMA file can't be matched 

*flag those providers with multiple matches
bysort first_name last_name prac_state: gen mult_match=_N
count if mult_match==1 & _merge==3 // 1,062,816 uniquely matched at names+state level

*look for number of records per provider (AMA id)
bysort provider_id: gen check=_N
count if check==1 & _merge==3 // 1,148,339 uniquely matched at provider_id level ~ 70.5% of AMA file matched

*look for number of records per provider (NPI)
bysort npi: gen check_npi=_N
count if check_npi==1 // 1,275,983 uniquely matched at NPI level 

*keep only those who are uniquely matched
keep if check==1 & _merge==3
	
*parse variables
keep provider_id npi

*save final crosswalk file
save provider_AMA_NPI_matched_KEY.dta, replace